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Abstract 

In this article we discuss six degrees of separation, which has been proposed by Milgram, 
from a theoretical point of view. Simply if one has k friends, the number N of indirect friends 
goes up to ~ k d in d degrees of separation. So it would easily come up to population of whole 
world. That, however, is unacceptable. Mainly because of nonzero clustering coefficient C, 
N does not become ~ k d . In this article, we first discuss relations between six degrees of 
separation and the clustering coefficient in the small world network proposed by Watt and 
Strogatz[2],[3]. Especially, conditions that (N) > (population of U.S. A or of the whole world) 
arises in the WS model is explored by theoretical and numerical points of view. Secondly 
we introduce an index that represents velocity of propagation to the number of friends and 
obtain an analytical formula for it as a function of C, K, which is an average degree over all 
nodes, and some parameter P concerned with network topology. Finally the index is calculated 
numerically to study the relation between C, K and P and N . 

keywords: Six Degrees of separation, Small world Network, Propagation Coefficient, 
Watt-Strogatz Model, Clustering Coefficient, Average Path Length 

1 Introduction 

In 1967, Milgram made a great impact on the world by advocating the concept "six degrees of 
separation" by an social experiment in a celebrated paper pp. "Six degrees of separation" shows 
that people have a narrow circle of acquaintances. A series of social experiments made by him 
suggest that all people in USA are connected through about 6 intermediate acquaintances. In 
this paper we inspect from a rather theoretical point of view that this phenomenon, so called "six 
degrees of separation" is not so surprising and if anything natural one. 

This article is first motivated by a following simple consideration; If every person has K ac- 
quaintances, so that after L steps of intermediate acquaintances the person would be able to convey 
a mail, in general information, to about S — K L persons. With the proviso, however, that the net- 
work of acquaintances has a tree structure without any loop ( no clustering coefficient), evaluating 
it in more detail, it is 

i=0 

where it is assumed that the relation of acquaintances is symmetric. Thus information will spread 
out among exponentially many persons from one person with L steps. Though a person that 
received information, of course, may convey the information to only a part of his/her acquaintances, 
when a network has a tree structure, six degrees of separation for any two persons would not be 
so mysterious. 

Real networks, however, naturally have structures with loops. If there is some loops or an 
effective clustering coefficient in the network of acquaintances, this discussion will greatly altered. 
One of aims of this article is to evaluate the effect of clustering coefficient on propagation of 
information. How much docs the clustering coefficient reduce the population that information is 
provided ? We consider it from three points of view. They consist of the following three; 

1. To inspect six degrees of separation for Watts-Strogatz type model where analytic expressions 
for the clustering coefficient and the average path length are found. 
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2. To inspect six degrees of separation for more general small world networks with uniform 
clustering coefficient. 

3. To consider empirical networks such as a network in Mixi and a network of actor/actress 
besed on data of Bacon game. 

The plan of this article is as follows. In the next section we argue on Watts-Strogatz type 
small world networks [3] where theoretical evaluation of the average length and the clustering 
coefficient has been made. We study the propagation of information on the networks by making 
numerical analyses based on these formulae, To study more general small world networks, we 
present a propagation coefficient model in the section 3. We can not analyze, however, large class 
of small world networks including scale free networks due to technical reasons at present. We adopt 
a homogeneous hypothesis which would be explained in detail in the section 3. Various types of 
homogeneity are postulated in the propagation coefficient model. In the section 4, we analyze 
experimental data, such as a network in Mixi and a network based on data of Bacon game, as to 
estimate clustering coefficient. Concluding remarks are given in the last section. 

2 Watts-Strogatz Model 

In this section we investigate the effect of the clustering coefficient on diffusion of information in 
Watts-Strogatz type small world networks. The analytic expressions for the clustering coefficient 
and the average path length in the networks have been found. For the original Watts-Strogatz 
version of the model, the clustering coefficient C (p) has been given [18] by 

C <*) = W^p-*)*> ( 2 ) 

where n is the network size, that is, the number of nodes on the network, K is the average degree 
and p is the rewiring probability. Moreover the average node-node separation L(p) in a modified 
version of the original Watts-Strogatz model has found in the limit of low density of shortcuts!!) ; 

Hp) = ^F(^), F{x) = - 1 tanh- 1 J-?—, for small p. (3) 

In the modified version of the original small world model in which shortcuts edges are added 
between randomly chosen node pairs, no bonds are removed. Here edges are not rewired, rather 
adding edges thus ensuring that the modified network stays connected. We attempt analyses of 
six degrees of separation based on these two formulae. 

2.1 Parameter Regions for Six Degrees of Separation 

In this subsection, we explore possible regions of parameters introduced in the previous section 
for six degrees of separation. First we explore parameters' regions in order that information can 
spread among on average 10 9 persons by 6 ± 1 steps. Taking L = 6 ± 1 and n = I0 9 in Eq. (3), we 
explore the possible regions in a p - if space. We can numerically find them within meaningful 
values of the parameters. They are shown in Fig.l. As far as p is not extremely small that is the 
small world region, that K, the average number of contacts of a person, is several tens is sufficient 
for information to spresd from one person to 10 9 persons. 

The solution for C — K space where p is eliminated from Eq.(I) and Eq.(2) is described in 
Fig. 2. Though Eq. (3) only holds only at small p, then at least we can work out two equations 
simultaneously. Since as p becomes large, C becomes small, the validity of the analysis would be 
lost for small C. Fig. 2 asserts that it is sufficient that K is several tens for relatively large values 
of C. 
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FigDl. p-K plots in SW model. FigD2. C-K plots in SW model. 

@ 

Through these analyses, we conclude that following regions of parameters are roughly needed; 

C = 0.55 - 0.75, 
p = 0.01 - 0.04, 
K = 15-25, 

in order that information can spread from one person to 10 9 persons at about 6 steps on Watts- 
Strogatz type small world networks. 

2.2 Population propagated at L = 6 

In this subsection we consider that to how many persons information can propagate at exact six 
steps from a person. We find adequate regions in a parameters' space K — C when n falls between 
10 7 and 10 9 at L — 6. For each K, typical values of C are listed in Table 1. 

Table 1. C and K for n ~ 10 7 - 10° 



K 


9 


10 


12 


15 


20 


25 


49 


100 


147 


194 


C 


0.54 


0.57 


0.61 


0.64 


0.68 


0.69 


0.724 


0.738 


0.74 


0.744 



As K becomes large, n obviously becomes large. From Table 1, K is not so large even for rather 
large C. Thus information can readily spread to about a billion persons at 6 steps, if C took fairly 
small value. 



Table 2. Empirical Networks with large C > 0.5 



Network 


Number of Vertices 


< K > 


C 


< L > 


Index 


Company directors 


7673 


14.4 


0.59 






Coauthorships 


56627 




0.726 
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1.2 


in the SPIREES e-archive 












Collaboration net 


70975 




0.59 


2.1294 


9.5 


collected from math.journals 












Collaboration net 


209293 




0.76 


2.4 


6 


collected from neurosci. journals 












metabolic network 


315 




0.59 






World Web 


470000 




0.69/0.44 


1.5/2.7 


2.65 



So far we have seen that it has been possible to realize six degrees of separation even rather large 
C. There, however, are not so many empirical networks with large C. Networks with rather large 
C that have discovered so far are listed in Table 2 [19] , [14] , [15] , [16] , [14] , [10] . Here blanks represent 
that they are unknown and the index in the rightest column is that for scale free. These networks 
mostly have scale free nature, so that they are not Watts-Strogatz type small world networks. 



3 Propagation Coefficient Model 

We study Milgram- like-propagation of information in wider class of networks, including those other 
than small world networks. First we focus our attention on any one node, which is the node in 
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O-generation, in general networks. Next we explore all nodes connected with the first node, which 
are nodes in the 1-generation. Next we explore all nodes connected with the nodes in the 1- 
generation apart from the nodes of O-generation, which are nodes in the 2-generation. We continue 
these procedures until all nodes on a network are covered by these procedures. These procedures 
are effective in any complex networks. There are only two generations in complete graphs. The 
maximal generation number Nq of a network is larger than the diameter of the network by 1 and 
so the maximal generation number G is just equal to the diameter. This picture of networks make 
the analyses of the propagation of information on the network manageable. 

Now we introduce some geometrical quantities that used in this article together with their no- 
tation. 



d means the i-th generation and rii is the numbers of generation Gi. 

N is the total number of nodes of a network or the size of the network. 

is the number of edges from node j in Gi to nodes in Gi + \. 

ki i is the number of edges connected between the same generation Gj. 

Ci is a contribution to the clustering coefficient produced by edges in Gi. 

K^is the degree of the node j. 
We define the propagation coefficient from Gi to Gi+i by fc^j+i as an average of 
We make the assumptions for simplicity of analyses. 

1. The size of the network is infinite. 

2. A parameter qj is the possibility that a node j has two parents. 

3. There is no backflow in the propagation of information. 

4. The homogeneous hypothesis ; 

d = C = const. (4) 
qj = q = const, but q — at Go generation. (5) 
j(U) = K = const {nearly equal degree). (6) 

Under these assumptions, the following relations hold; 

rii 

K = i + ^M + fc (8) 

rii 

= ki,i+irii(l - q), (9) 
d 

N = J2 n i- ( 10 ) 

In order to investigate the effect of the clustering coefficient upon the propagation of information 
on a network, we consider when does the clustering coefficient increase in this picture of networks. 
Notice that there are two cases that make a contribution to the clustering coefficient in this picture 
of networks. One is the case that ages are linked together with nodes in the same generation. The 
other is the case that one node has two parent nodes that are linked each other. They are shown 
in the Fig. 3. 
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Links between same generations. The cases that a node has two parents. 

FigD3. Two patterns that contribute to a clustering coefficient in Gi generation. The possibility 

that the right hand pattern occurs is q. @@ 



Since each node has K edges from the assumption 4., C, is generally obtained by 

U(q) 



Ci 



KC2 



(11) 



We investigate two types of contributions to t, (q) in order. 

First we consider the case in the left hand side of Fig. 3, that is to say tj(0). When one edge 
between the same generation Gi is added, the average probability that the edge is just one between 
children nodes with a common parent node in is ^ Cij nfii- So as ki^ edges in Gi are 

totally added, the number of triangles in Gi become 



U(0) = 



ki- 



Coki 



2KiAn t -l 



1) ki^i^ki 



i) 



uC 2 



ki-^rii-i-i ri/-i fc i-i,i-i 



(12) 



since there are rij_i families in Gi and rij = nj-ifcj-i^ at q = in Eq. (9) is used in the last 
equality. We can obtain explicit expressions for first a few ^(0) ; 



ti(0) 
t 2 (0) 



k 2 



ku 



' Kk, 



t3( o) = fc 33 > 3 -~ 1 ■ 

Kkiok 2 ,3 - 1 



(13) 



In the other case, the possibility that two parents are linked in their generation Gi-\ is 
ki-i,i-i/n i - 1 C2 and the number that nodes in Gi have two parent nodes is qrii-iki-i^. Com- 
bining these two facts, the contribution t\ (q) to i, (g) of this case is given by 



t'(q) = {qrii-xki-^i) 



kj-i,i-i _ 2gfc i _i ;i fci_i.j_i 

Tli-i — 1 



nLi^-i^i-?) 1 - 9 -!' 



(14) 



where Eq.(9) is used in the last equation. 
Thus tj(g) is finally obtained by 



(15) 



where each term in parentheses shows the contribution from the left hand figure and the right one 
in the Fig. 3, respectively. 
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Thus the clustering coefficients d in the generation d are obtained by 

c = f (ki-n - l)(K - \ -k iii+1 ) 2g(if-l-fc t _ M ) 

1 1)V fei-i^-i-l n,_!-l 

where the equation derived from Eq.(4) 



(16) 



(17) 



is used. 

By using no = 1 and fc ,i = n\ = K, we can give explicit expressions for first a few d', 



d 
d 



1 



h,2 /(fci,2 -!)(«"- 1-^2,3) , 2q(K-l-k 2 , 



K 



hi 



Kk h2 - 1 



+ 



if 



(18) 



Using the homogeneous hypothesis C, = C = const., we can express every fcj-i,, in terms of the 
d By way of example, we obtain 



(K-i)(i-d, 



T (K n C(l-2g(l-C) if(if-l)(l~C) 
^ 3 = 1 - ^ {K-l){l-C)-l l 



In general we notice that the following recursion relation is satisfied; 

CK(K - l)(ni_i - 1) - 2qrA i _i ii fc i _ M n i 



i,t+l — 



- l)ni_i(rii_i - 1) 



(19) 



(20) 



where 

i4 iii+ i=Jf-l-fc ii<+ i. (21) 

means the number of edges connected between the same generation d. We can numerically 
solve this recursion relation with respect to fcj-i^. 

In order to get numerical value of fcj-i^, we need to fix three parameters, q, K, and d When 
fci-i,, is calculated, N is found from Eq. (9) and (10). If network topology was a tree structure, 
information would spread over ~ K l persons at I steps as stated before. In the present case, it 
is inferred that the clustering coefficient has strong influence on the propagation of information. 
Then the spread of information would be restricted to rather less persons in networks with large d 
We consider how the spread of information is restricted owing to C. We measure it by propagation 
ratio R that the ratio of the population N really conveyed in the present case to K l ; 



R = 



N 

Ki- 



rn) 



We evaluate R at I = 6 with a wide range of the three parameters q, K, and C within positive 
fcj-i^. They are listed in Table 3. 

Table 3. Propagation ratio and C m i n in parameter space (q, K) at L = 6 



K/q 


q=0 


q=0.1 


q=0.2 


q=0.3 


q=0.4 


q=0.5 


K = 10 a 


0.996-0.229 


0.996-0.032 


0.996-0.020 


0.996-0.025 


0.996-0.028 


0.996-0.013 




C =0.29 


C =0.33 


C =0.38 


C =0.42 


C =0.46 


C =0.51 


K = 10 2 


0.961-0.035 


0.961-0.037 


0.961-0.025 


0.961-0.029 


0.961-0.016 


0.961-0.018 




C =0.28 


C =0.32 


C =0.37 


C =0.41 


C =0.46 


C =0.50 


K = 50 


0.876-0.004 


0.876-0.041 


0.876-0.028 


0.876-0.001 


0.876-0.001 


0.876-0.023 




C =0.27 


C =0.31 


C* =0.36 


C=0.4 


C =0.46 


C =0.49 


K = 28 


0.82-0.018 


0.82-0.0015 


0.82-0.0054 


0.82-0.0102 


0.82-0.0147 


0.82-0.0175 




C =0.27 


C =0.32 


C =0.36 


C=0.4 


C =0.44 


C =0.48 


K = 10 


0.664-0.046 


0.664-0.041 


0.664-0.039 


0.664-0.006 


0.664-0.006 


0.664-0.030 




C =0.21 


C =0.25 


C* =0.29 


C =0.33 


C =0.39 


C =0.42 
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We find the following facts that by observing in entire parameter region of (q, i^)-space; 

(1) 0.29 < C < 0.51, 

(2) N ~ (100 - a few percent) of K d=6 , 

(3) ^i,j ~ 2*< 

Information can spread over a large portion of K l , even when C has rather a large value. It 
is like "small world network" proposed by Watts and Strogatz[2],[3]. From Table 3, we find that 
a person needs only have about 50 acquaintances in order that information can spread over a few 
hundred million from only a person even at the worst with the largest clustering coefficient. This 
condition may be satisfied rather easily in the actual society. 

4 Some Empirical Networks 

We consider data on Mixi[T2] and Bacon Game[T7] as suitable networks for our aim. 
4.1 Mixi Data 

We can interpret Table 4[llj to mean that a person can convey information to how many persons at 
each generation. First three columns in the Table 4 is based on the data described by MasudafTTj. 
The last column is average propagation coefficients at each generation. Every column in the 

Table 4 except the first one is shown in Fig. 4 where the length means the distance between a 
root, that is, a first person and a descendant on the graph. Notice that the length is also the same 
as generation number i. The distribution of the number of new nodes at Gi is a bell-shaped with 
the peak at the length — 5. Its cumulative distribution has a logistic shape that shows nearly all 
of the population are included up to the length = 6. It drastically rises at the length — 5 ± 1. It 
is, meanwhile, speculated that some hubs are brought over to the distribution. This suggests that 
this network is scale free as also suggested by Yuta et al. [H] . The propagation coefficient exhibits 
a strange behavior at first glance. We speculate that this is an incidental event as will be observed 
in the next subsection. 



Table 4. Data on Mixi by ATR group 



generation i 


number of new nodes at Gi 


number of total nodes 


propagation coefficient 





1 


1 


28 


1 


28 


29 


9.4642857 


2 


265 


294 


24.430189 


3 


6474 


6768 


13.792864 


4 


89295 


96063 


1.9807044 


5 


176867 


272930 


0.417692 


6 


73876 


346806 


0.167334452 


7 


12362 


359168 


0.1173758 


8 


1451 


360619 


0.0978635 


9 


142 


360761 


0.2112676 


10 


30 


360791 


0.133333333 


11 


4 


360795 


1.250 


12 


5 


360800 


0.4 


13 


2 


360802 
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FigD4 Propagation In Mixi. @@ 
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Since we can find fcj-j^ and n, from Table 4, if K is determined, we can evaluate clustering 
coefficients by Eq (16) for various values of q. While the average degree is < K > = 10.4 in the 
data for Mixi according to Yuta et al.[12], we assume < K >= ko i so as to get adequate values 
for the clustering coefficient. This is the assumption that the number of edges left from a first 
one person is K. Though needless to say that is not always the case, this assumption can derive 
meaningful clustering coefficients in the present case. The values of them are given by Table 5. 
This roughly show the following relation holds; 

Q~1(T (G * +1) . (23) 

When i > 3, the structure of the Mixi network is practically a random graph. This also means 
that Ci is not constant and so the homogeneous hypothesis is rejected in the present case. This is 
due to almost scale free nature of the Mixi network and finite size effects. Though considerations 
in the section 2 is not adequate for such cases, they would be adequate to Watts-Strogatz type 
networks. 

Table 5. Clustering Coefficients for Mixi data based on propagation coefficients with 

< K >= ko,v 





q=0.5 


q=0.8 


Gi = l 


none 


none 


Gi = 2 


9.6 x lCT 3 - 4 


1.4 x l(T :i 


Gi = 3 


7.6 x 1(T 5 


8.3 x 1(T 5 


Gi = 4 


7.4 x KT B 


8.8 x 10~ 6 


Gi = 5 


5.7 x 1(T 7 


7.5 x lO - '' 


Gi = 6 


none 


1.9 x 10~ 8 



4.2 Bacon Game 

We can draw a good deal of information from the web page[17]. Fig. 5 displays figures similar 
to Fig. 4 in the case of Bacon game. The above figures in Fig. 6 are drawn when the starting 
person is literally Bacon and the below ones are done when the starting person is the one with the 
longest average path length on the actor network. They are each other quite alike apart from the 
position of the peak of the distribution (both ends) or maximum gradient (the middle). Key actors 
would connect with all the actors by short distances so that the distance at the peak is also small. 
Especially the propagation coefficient marks its peak at the first step for the leading actor Bacon. 
These observations demonstrate that the homogeneous hypothesis is not valid like the subsection 
4.1. 

In these experimental networks, we find that d does not take a constant value mainly due to 
finite size effect and almost scale free nature. A simple estimation of C\ occasionally leads to some 
negative value owing to these. A problem in which Ci becomes negative when we take K = 10.4 
in the subsection 4.1 crops up. On closer investigation, a finite size effect brings on fej-i,, < 1 and 
hubs alone are connected to extraordinarily many persons. These cause negative Gj. 

In order to solve the problems, we should consider this subject more minutely beyond the ho- 
mogeneous hypothesis, including distribution of Cj of the individual, a degree correlation between 
neighboring persons and so on. 
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FigD5 Propagation In Bacon Game. Left: the number of new nodes appeared at a generation Gj. 
Middle: the number of total nodes appeared up to the generation GV Right: The propagation 
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5 Summary 

In this article we investigate three points about six degrees of separation proposed by MilgrampQ. 
First one is the analyses of the Watts-Strogatz type small world network, which is based on analytic 
expressions for the clustering coefficient and the average path length. Second we gave discussions 
on the propagation coefficient model based on the homogeneous hypothesis for information trans- 
mission. Though empirical networks do not necessarily support the hypothesis, we turn out to 
establish the formalism to calculate propagation coefficient and so on in the schemes where in- 
formation propagates from generation to generation. Third some experimental networks were 
investigated where the validity of the homogeneous hypothesis was examined so that some points 
at issue were made clear. Lastly numerical analyses carried out in these three subjects. Knowledge 
gained through these investigations is summarized as follows. 

1. The effect of the clustering coefficient on diffusion of information over networks of human 
relations is not so crucial. The effect of clustering coefficient on them only reduces the population 
who can receive information in a tree graph to a few percentage of it. Though there is a double-digit 
decrease in the population who can receive information, each person has only to have dozens of con- 
tacts for six degrees of separation. Thus " six degrees of separation" is not so amazing phenomenon. 

2. The homogeneous hypothesis should be made more accurate. By way of example, consid- 
erations for distribution of the clustering coefficient in every node, degree correlation between 
neighbors as well as degree distribution and so on should be given. They are future issues to be 
addressed. 



3. The finite size effect of networks has to be considered in the analysis of empirical data. 
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They are summaries and future issues of this article. 
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